1 Dataset description

There are two submissions: 10267 & 10270.

  • In each submission, 2390 families with .vcf files are included.
  • For each family, two vcf files are provided,
    • one named “sorted”.
    • the other named “annotated”.

1.1 Submission 10267

  • For files named “sorted”,
    • 852 families without GL/PL information
    • 1538 families with valid GL/PL information
      • 310 Trios
      • 1228 families with >=1 siblings
  • For files named “annotated”,
    • 1096 families without GL/PL information
    • 1294 families with valid GL/PL information
      • 309 Trios
      • 985 families with >=1 siblings

Note that for FID:13562, there is no father information in the .vcf file. Also, all families with valid GL/PL information from files named “annotated” are included from files named in “sorted”.



1.2 Submission 10270

  • For files named “sorted”, there is no GL/PL information.
  • For files names “annotated”,
    • 703 families without valid GL/PL information
      • including 13 families with variants < 2000.
    • 1687 families with valid GL/PL information
      • 292 Trios
      • 1395 families with >=1 siblings


1.3 Combined

Note that combing 10267 & 10270, there are 2206 families with complete vcf files.

  • 415 Trios
  • 1791 families with >=1 siblings


2 Call de novo mutations

Triodenovo was used to call de novo mutations:

  • Only variants with GL/PL information were retained.
  • Families were splitted to Parents-Offspring trios.
  • Filters: --minDP 7 --minDepth 10 and other default options
  • Post filters (referred to Homsy et al. 2015 Science):
    • For offsprings: a minimum 10 total reads, 5 alternate allele reads, and a minimum 20% alternate allele ratio if alternate allele reads ≥10 or, if alternate allele reads is <10, a minimum 28% alternate ratio
    • For parents: a minimum depth of 10 reference reads and alternate allele ratio <3.5%

The scripts are stored in /scratch/90days/uqywan67/auti_proj/SSC/scripts/call_deno.R




3 Annotation

  • ANNOVAR was used to annotate refGene and allele frequencies.
    • hg19refGene, exac03nonpsy, gnomad_exome211 databases were used.
    • Based on annotation, further filtered DNMS:
      • exonic or canonical splice-site variant
      • MAF <= 0.001 in non-psychiatric subsets of ExAC (Header: ExAC_nonpsych_ALL in ANNOVAR), and in control samples of gnoMad databases (Header: controls_AF_popmax in ANNOVAR).
  • Gene-level pLI for PTVs was downloaded from ExAC
  • MPC scores for missense variants were annotated using VEP.

3.1 DNMs summary

After applying filters, a total of 4136 DNMs were found in 1758 families with 2430 offsprings.

  • 3378/4136 (81.7%) DNMs were the same with published SSC DNMs from Krumm et al. 2015 and Iossifov et al. 2014.
  • 273 trio-families (with 440 DNMs) and 1485 quads-families (with 3696 DNMs, including 1855 DNMs in 1118 probands and 1841 DNMs in 1039 siblings).
  • 3592 DNMs in 2074 males and 594 DNMs in 356 females.
  • 2295 DNMs in 1391 probands and 1841 DNMs in 1039 siblings.
  • 2792 DNMs were not presented in ExAC, 2861 DNMs were not presented in gnoMad, 2577 DNMs were not presented in both datasets.

3.1.1 DNM counts

Note that a cutoff 10 were used to exclude individuals with DNM counts > 10, which corresponding to 99% quantiles.



3.1.2 DNM mutation types



3.1.3 pLIs for PTVs



3.1.4 MPC scores for missense variants

3.2 DNMs in quads-familiy

  • A total of 3696 DNMs were observed in 1485 quads-families
    • 1855 DNMs in 1118 probands and 1841 DNMs in 1039 siblings
    • 3222 DNMs in 1886 males and 474 DNMs in 271 females.

3.2.1 DNM counts



3.2.2 DNM mutation types



3.2.3 pLIs for PTVs



3.2.4 MPC scores for missense variants



4 Burden test analysis